NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models

Savadikar, Chinmay; Song, Xi; Wu, Tianfu (May 2025, Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025.)

Fine-tuning large pretrained Transformer models can focus on either introducing a small number of new learnable parameters (parameter efficiency) or editing representations of a small number of tokens using lightweight modules (representation efficiency). While the pioneering method LoRA (Low-Rank Adaptation) inherently balances parameter, compute, and memory efficiency, many subsequent variants trade off compute and memory efficiency and/or performance to further reduce fine-tuning parameters. To address this limitation and unify parameter-efficient and representation-efficient fine-tuning, we propose Weight-Generative Fine-Tuning (WeGeFT, pronounced wee-gift), a novel approach that learns to generate fine-tuning weights directly from the pretrained weights. WeGeFT employs a simple low-rank formulation consisting of two linear layers, either shared across multiple layers of the pretrained model or individually learned for different layers. This design achieves multifaceted efficiency in parameters, representations, compute, and memory, while maintaining or exceeding the performance of LoRA and its variants. Extensive experiments on commonsense reasoning, arithmetic reasoning, instruction following, code generation, and visual recognition verify the effectiveness of our proposed WeGeFT.
more » « less
Full Text Available
Relay Assisted Cooperative Ambient Backscatter Communication With Hybrid Long-Short Packets

https://doi.org/10.1109/TVT.2024.3384080

Song, Xi; Han, Dongsheng; Shi, Liqin; Sun, Haijian; Hu, Rose Qingyang (April 2024, IEEE Transactions on Vehicular Technology)

Full Text Available
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers

https://doi.org/10.1109/CVPR52729.2023.01781

Grainger, Ryan; Paniagua, Thomas; Song, Xi; Cuntoor, Naresh; Lee, Mun Wai; Wu, Tianfu (June 2023, IEEE)

Vision Transformers (ViTs) are built on the assumption of treating image patches as “visual tokens” and learn patch-to-patch attention. The patch embedding based tokenizer has a semantic gap with respect to its counterpart, the textual tokenizer. The patch-to-patch attention suffers from the quadratic complexity issue, and also makes it non-trivial to explain learned ViTs. To address these issues in ViT, this paper proposes to learn Patch-to-Cluster attention (PaCa) in ViT. Queries in our PaCa-ViT starts with patches, while keys and values are directly based on clustering (with a predefined small number of clusters). The clusters are learned end-to-end, leading to better tokenizers and inducing joint clustering-for-attention and attention-for-clustering for better and interpretable models. The quadratic complexity is relaxed to linear complexity. The proposed PaCa module is used in designing efficient and interpretable ViT backbones and semantic segmentation head networks. In experiments, the proposed methods are tested on ImageNet-1k image classification, MS-COCO object detection and instance segmentation and MIT-ADE20k semantic segmentation. Compared with the prior art, it obtains better performance in all the three benchmarks than the SWin [32] and the PVTs [47], [48] by significant margins in ImageNet-1k and MIT-ADE20k. It is also significantly more efficient than PVT models in MS-COCO and MIT-ADE20k due to the linear complexity. The learned clusters are semantically meaningful. Code and model checkpoints are available at https:/github.com/iVMCL/PaCaViT.
more » « less
Full Text Available
Matrix Completion under Low-Rank Missing Mechanism

https://doi.org/10.5705/ss.202019.0196

Mao, Xiaojun; Wong, Raymond K.; Chen, Song Xi (January 2021, Statistica Sinica)

Full Text Available
High-dimensional empirical likelihood inference

https://doi.org/10.1093/biomet/asaa051

Chang, Jinyuan; Chen, Song Xi; Tang, Cheng Yong; Wu, Tong Tong (October 2020, Biometrika)
null (Ed.)
Summary High-dimensional statistical inference with general estimating equations is challenging and remains little explored. We study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications tests. First, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. Second, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. Numerical studies demonstrate promising performance and potential practical benefits of the new methods.
more » « less
Full Text Available
Towards Interpretable Object Detection by Unfolding Latent Structures

Wu, Tianfu; Song, Xi (October 2019, IEEE International Conference on Computer Vision)

This paper first proposes a method of formulating model interpretability in visual understanding tasks based on the idea of unfolding latent structures. It then presents a case study in object detection using popular two-stage region-based convolutional neural network (i.e., R-CNN) detection systems. The proposed method focuses on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. It utilizes a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of regions of interest (RoIs). It presents an AOGParsing operator that seamlessly integrates with the RoIPooling /RoIAlign operator widely used in R-CNN and is trained end-to-end. In object detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the qualitatively extractive rationale generated for interpreting detection. In experiments, Faster R-CNN is used to test the proposed method on the PASCAL VOC 2007 and the COCO 2017 object detection datasets. The experimental results show that the proposed method can compute promising latent structures without hurting the performance.
more » « less
Full Text Available
Matrix Completion with Covariate Information

https://doi.org/10.1080/01621459.2017.1389740

Mao, Xiaojun; Chen, Song Xi; Wong, Raymond K. (October 2017, Journal of the American Statistical Association)

Full Text Available

Search for: All records